A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints
نویسندگان
چکیده
Constrained Markov decision processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages linear programming formulation repeated optimistic planning for finite-horizon CMDP provide probably approximately correctness (PAC) guarantee number episodes needed ensure near optimal policy, i.e., with resulting value close that and within low tolerance, high probability. The shown have dependence sizes state action spaces quadratic time horizon upper bound possible successor states state-action pair. Therefore, if much smaller than size space, becomes in horizon.
منابع مشابه
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such...
متن کاملFinite horizon robust model predictive control with terminal cost constraints
In this paper, we develop a finite horizon model predictive control algorithm which is robust to modelling uncertainties. A moving average system matrix is constructed to capture modelling uncertainties and facilitate the future output prediction. The paper is mainly focused on the step tracking problem. Using linear matrix inequality techniques, the design is converted into a semi-definite opt...
متن کاملOnline Learning with Expert Advice and Finite-Horizon Constraints
In this paper, we study a sequential decision making problem. The objective is to maximize the average reward accumulated over time subject to temporal cost constraints. The novelty of our setup is that the rewards and constraints are controlled by an adverse opponent. To solve our problem in a practical way, we propose an expert algorithm that guarantees both a vanishing regret and a sublinear...
متن کاملFinite-Horizon Markov Decision Processes with State Constraints
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize costs) in a given stochastic dynamical environment. In many practical scenarios (multi-agent systems, telecommunication, queuing, etc.), the decision-making probl...
متن کاملA memory-efficient algorithm for multiple sequence alignment with constraints
MOTIVATION Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficientl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i9.16979